Taming Control Divergence in GPUs through Control Flow Linearization

نویسندگان

  • Jayvant Anantpur
  • R. Govindarajan
چکیده

Branch divergence is a very commonly occurring performance problem in GPGPU in which the execution of diverging branches is serialized to execute only one control flow path at a time. Existing hardware mechanism to reconverge threads using a stack causes duplicate execution of code for unstructured control flow graphs. Also the stack mechanism cannot effectively utilize the available parallelism among diverging branches. Further, the amount of nested divergence allowed is also limited by depth of the branch divergence stack. In this paper we propose a simple and elegant transformation to handle all of the above mentioned problems. The transformation converts an unstructured CFG to a structured CFG without duplicating user code. It incurs only a linear increase in the number of basic blocks and also the number of instructions. Our solution linearizes the CFG using a predicate variable. This mechanism reconverges the divergent threads as early as possible. It also reduces the depth of the reconvergence stack. The available parallelism in nested branches can be effectively extracted by scheduling the basic blocks to reduce the effect of stalls due to memory accesses. It can also increase execution efficiency of nested loops with different trip counts for different threads. We implemented the proposed transformation at PTX level using the Ocelot compiler infrastructure. We evaluated the technique using various benchmarks to show that it can be effective in handling the performance problem due to divergence in unstructured CFGs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Warp Size Impact in GPUs

There are a number of design decisions that impact a GPU's performance. Among such decisions deciding the right warp size can deeply influence the rest of the design. Small warps reduce the performance penalty associated with branch divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also increase branch divergence. This leaves ...

متن کامل

Stack-less SIMT reconvergence at low cost

Parallel architectures following the SIMT model such as GPUs benefit from application regularity by issuing concurrent threads running in lockstep on SIMD units. As threads take different paths across the control-flow graph, lockstep execution is partially lost, and must be regained whenever possible in order to maximize the occupancy of SIMD units. In this paper, we propose a technique to hand...

متن کامل

MHD Flow and Heat Transfer Analysis of Micropolar Fluid through a Porous Medium between Two Stretchable Disks Using Quasi-Linearization Method

In this paper, a comprehensive numerical study is presented for studying the MHD flow and heat transfer characteristics of non-Newtonian micropolar fluid through a porous medium between two stretchable porous disks. The system of governing equations is converted into coupled nonlinear ordinary ones through a similarity transformation, which is then solved using Quasi-linearization ...

متن کامل

Optimal Control of Nonlinear Multivariable Systems

This paper concerns a study on the optimal control for nonlinear systems. An appropriate alternative in order to alleviate the nonlinearity of a system is the exact linearization approach. In this fashion, the nonlinear system has been linearized using input-output feedback linearization (IOFL). Then, by utilizing the well developed optimal control theory of linear systems, the compensated ...

متن کامل

Adaptive Input-Output Linearization Control of pH Processes

pH control is a challenging problem due to its highly nonlinear nature. In this paper the performances of two different adaptive global linearizing controllers (GLC) are compared. Least squares technique has been used for identifying the titration curve. The first controller is a standard GLC based on material balances of each species. For implementation of this controller a nonlinear state...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014